Effective Topic Distillation with Key Resource Pre-selection

نویسندگان

  • Yiqun Liu
  • Min Zhang
  • Shaoping Ma
چکیده

Topic distillation aims at finding key resources which are high-quality pages for certain topics. With analysis in non-content features of key resources, a pre-selection method is introduced in topic distillation research. A decision tree is constructed to locate key resource pages using query-independent non-content features including in-degree, document length, URL-type and two new features we found out involving site’s self-link structure analysis. Although the result page set contains only about 20% pages of the whole collection, it covers more than 70% of key resources. Furthermore, information retrieval on this page set makes more than 60% improvement with respect to that on all pages. These results were achieved using TREC 2002 web track topic distillation task for training and TREC 2003 corresponding task for testing. It shows an effective way of getting better performance in topic distillation with a dataset significantly smaller in size.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UMass at TREC 2007 Blog Distillation Task

The focus of the blog distillation task is finding blogs with a principle, recurring interest in a specific topic. For this task, we considered a blog as a collection of postings and used resource selection approaches. Further, we investigated techniques that penalized general blogs and combined resource selection techniques. This combination demonstrated significant improvements over baselines.

متن کامل

Subsite Retrieval: A Novel Concept for Topic Distillation

Topic distillation is one of the main information needs when users search the Web. In previous approaches to topic distillation, the single page was treated as the basic searching unit. This strategy is inherited from general information retrieval, which has not fully utilized the structure information of the Web. In this paper, we propose a novel concept for topic distillation, named subsite r...

متن کامل

An Implementation Model for Courses in Human Resource Training

An Implementation Model for Courses in Human Resource Training M. Sami'ee Zafarghandi, Ph.D. To arrive at an effective model for implementing courses in human resource training different approaches to this task were critically reviewed and their constructive aspects reutilized. The new approach was then tested to identify any probable defect. The final model consists of basic elem...

متن کامل

Integration of Design and Control for Energy Integrated Distillation

A systematic computer aided analysis of the process model is used as a pre-solution step for integration of design and control problem. In this paper, a static energy integrated distillation plant model is first presented, then the analysis is presented for a single distillation column, subsequently the column analysis is extended with the analysis for the heat pump. The analysis relates to the...

متن کامل

UIC at TREC - 2002 : Web Track ( Draft )

This is the first year that members of the Database and Information System Lab (DBIS) at University of Illinois at Chicago (UIC) participate in TREC. We participate in two tasks for the Web track: topic distillation and named page finding. Linkage information among documents as well as content information about documents is used in some of our submitted runs. We utilize the Okapi weighting sche...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004